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1. Introduction. Sensible data, as e.g. passwords, are usually stored via the use 
of a one-way hash function, so that retrieving the data implies comparing the stored 
hash with the hash of the new input. In many contexts, though, this simple procedure 
cannot be used due to the noisy or fuzzy nature of the data. An outstanding example 
is the storage of biometric data, e.g. in the form of fingerprint, iris, voice, DNA etc., 
for the purpose of authentication: the data derived from different acquisitions of the 
same biometric feature can slightly change from each other, and the biometric feature 
can slightly change itself for different reasons [11]. Therefore a certain threshold 
of tolerance is needed to distinguish legitimate from non legitimate users, but this 
prevents the standard use of collision resistant hash functions [5] . 

This problem has led to the proposal of systems for the secure storage of biometric 
passwords (see [10] for a selected survey of the literature), which essentially act as 
"tolerant" hash functions. The idea behind most of these methods is a combined use 
of error correcting codes and hash functions, whose model is the fuzzy commitment 
scheme [6]. 

A scheme which is apparently just the dual of this is the syndrome fuzzy hashing 
construction: in [T] we showed that it offers several advantages with respect to the 
fuzzy commitment scheme, in particular as far as information leakage is concerned. 

The fuzzy commitment scheme has later been generalized to other types of met- 
rics, such as the set difference metric [5] and the edit distance metric [3]. 

In particular, the fuzzy vault [5] uses polynomial interpolation in order to al- 
low authentication based on the matching of a sufficient number of features, while 
the fuzzy extractor 3 is a further generalization which combines the previous con- 
structions with particular objects called random extractors. These make the previous 
schemes stronger with respect to information leakage, although they cannot prevent 

it [an]. 

The choice of one scheme rather than another depends not only on the application 
or model scenario, but also on important issues, like for example privacy concerns, 
as already mentioned, or suitability for implementation, as we already discussed in 
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In this paper we will slightly modify the model scenario: we will tolerate up to, 
say, £ error bursts of maximum length 6, and possibly assume also the presence of 
other random errors. This assumption is realistic in many contexts, where errors are 
likely to appear in bursts, and leads us to focus on and take advantage of burst error 
correcting codes. We will investigate different choices of codes, depending on the 
type and the dimension of the bursts, and elaborate on their key features, like error 
correcting capability and decoding complexity. 

The structure of the paper is the following: in Section 2 we review the syndrome 
fuzzy hashing construction, which can be taken as our model scheme. The rest of the 
paper is devoted to solutions using burst error-correcting codes: we deal with base 
field representations of Reed-Solomon codes in Section 3 and with concatenated codes 
in Section 4. Lastly Section 5 presents some conclusions. 

2. Syndrome fuzzy hashing and burst error correction. We review here 
the syndrome fuzzy hashing construction, as presented in [1 . 

Suppose we need a tolerance of e errors, then an [n, A;]-linear block code C C F^', 
able to correct e errors, is selected, and it is described through its r x n parity- 
check matrix H, with r = n — k. Given a data vector x to be stored, the pair 
{Ha{x), Hx) is used to represent x, were Ha is a given hash function. When another 
vector y is acquired and is compared with x, the value Hx — Hy — H[x — y) — Hv 
is computed, that coincides with the syndrome associated to the difference vector 
V — x — y. Then, syndrome decoding is applied on Hv, according to the chosen code 
C. If d{x,y) < e, then v, which has Hamming weight equal to d{x,y), corresponds 
to a correctable error vector. So, syndrome decoding succeeds and correctly results 
in V. Then, starting from v and y, x can be computed, as well as Ha{x). The latter 
coincides with the stored value, so authentication succeeds. Otherwise, syndrome 
decoding fails or reports w ^ v. In such case, x' = w -\- y ^ x and Ha(x') ^ Ha(x) is 
obtained, and authentication fails. 

As stated before, this construction is the dual version of the fuzzy commitment 
scheme, but offers better security in terms of reduced information leakage. 

One can transform an [n, /cj-linear block code into a two-dimensional (array) code 
by writing the codewords into an ni x n2-array such that nin2 — n. K simple way of 
doing so is writing the vector entries into the array row by row. For syndrome decoding 
such a two-dimensional code is transformed back into the vector representation of the 
original linear block code and can be used as explained before. 

Definition 2.1. 

1. A one-dimensional burst (error) of length m is a vector of length m over ¥q, 
such that the first and last entries are non-zero. 

2. A two-dimensional rectangular burst (error) of size m x m' is an m x m'- 
matrix over Vq, such that the first and last columns and the first and last rows 
each contain a non-zero element. 

Note that a one-dimcnsional burst may occur horizontally or vertically in a two- 
dimensional code. 

Burst errors of this type appear in many real-life applications, so that in many 
contexts the syndrome fuzzy hashing construction can exploit the power of burst error 
correcting codes. 

3. Base field representations of Reed- Solomon codes. Rced-Solomon codes 
are optimal codes defined over an extension field F^m . When the entries are expanded 
over the base field, one gets a code over F^ that may be used for burst error correction 
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in different ways. 

Definition 3.1. Consider the extension field F^m and n ~ q™ — 1 distinct 
elements xi, . . . , a;„ e Fg™, and let < k < n. Then 

RS = {(/(xi), . . . , /(a;„)) | / e F,™ [x], deg(/) < fc} C F«„ 

is called a Reed-Solomon (RS) code. R has length n, dimension k and minimum 
distance d = n — k + \, which means that it is optimal (MDS). 

Expanding as row vectors (Construction I). When expanding the elements 
of Fgm in a Reed-Solomon code as elements of F™ via an Fg-isomorphism F^t = F^', 
one gets a g-ary linear code of length n' = nm = m{q"^ — 1) and dimension fc' = mk. 
The minimum distance is d' > d but the burst-error correction capability is higher 
than the random error correction capability. In fact, since at most [-^^^^J symbols 
from FgTTi can be corrupted, it is readily seen that such a code is able to correct any 

• single burst of length < m([^^J — 1) -|- 1, or 

• any £ many bursts of length < — 1) + 1. 

In the binary case, this expansion of codes over F2m into binary codes can be 
found in [7j Ch. 10 §5]. There one can also find that adding a parity check bit in 
the end of each m-vector results in a code of length n' = {rn + 1)(2™ — 1), dimension 
k' = mk and minimum distance d' > 2d. The analogue holds for q-ary codes. The 
higher minimum distance means that one can correct more random errors with this 
variation of the construction. 

Expanding as matrices (Constructions II and III). RS codes in g-ary repre- 
sentation can also be exploited if one wants to compare two data matrices (or higher 
dimensional data), if the difference pattern is likely to consist of two-dimensional 
bursts. 

To obtain a two-dimensional q-ary representation of the Reed-Solomon codes, it 
may be convenient to write each symbol from ¥qm into a rectangle or a square array. 
This can be done in different ways, e.g. row-wise, column-wise, in spiral shapes etc., 
which does not make a difference in the error-correction capability. Moreover, one 
can write the RS code itself (row-wise) into a rii x n2-niatrix such that nin2 = n. If 
rn is a square and we expand the symbols from F^™ into squares of length y/rn^ then 
one gets a g-ary array code of length ni^/m, x ni\pm. 

Theorem 3.2. A q-ary RS code with the parameters from above is able to correct 

• any single square burst of area < 



L — k -1 

2 ^ 



1 



I] , or 

2 

r 



any I many square bursts of area < (^y/m \J^^^ 
any single one- dimensional burst of length < -s/wd^^T^J ^ 1) + 1; or 
any £ many one- dimensional bursts of length < \/w(L^^^5i^J — 1) + 1- 



Proof. The one-dimensional burst error correction capability is the same as in the 
row vector expansion case, thus it remains to show the square burst case. 

Let X denote the side length of the burst. The burst corrupts the most extension 
field symbols of the code when e.g. the entry of its upper left corner lies on the entry 
of the lower right corner of some expanded symbol. Then the burst corrupts at most 

- 1^ 



1 

extension field symbols. Since the number of corrupted extension field elements has 
to be less than or equal to the error-correction capability of the RS code, it has to 
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hold that 



< 



.-k 



X < 



-k 



- 1 



which imphes the formula. In the case that there are i square bursts it has to hold 

that 



X- 1 



1 < 



n — k 



n — k 
21 



Note, that in analogy to Construction I, one can again add a parity check element 
to all vector expansions of the extension field elements to achieve a larger minimum 
distance. 

Another way of expanding the elements of Fgm = [a] is using companion ma- 
trices. For this let p{x) S Vq\x\ be the minimal polynomial of a and P £ jr^xm 
its companion matrix. Then F^m ^ Fg[P] and we can represent the extension field 
elements by 9-ary matrices. If the RS code is written in an ni x 712-matrix like above, 
then this results in a q-ary array code of length nim x n2'm. 

Theorem 3.3. A q-ary RS code expanded with companion matrices with the 
parameters from above is able to correct 



any single square burst of area < (^m y 
any £ many square bursts of area < I m 



i — k 



- 1 



1 



2t 



1 



or 
2 



• any single one- dimensional burst of length < rn{\J^h2^\ — 1) + 1, or 

• any £ many bursts of length < w([^^^J — 1) + 1. 

The proof is analogous to the one of Theorem 13.21 and hence omitted. 

It can be noticed that this second expansion can offer a comparable burst error 
correction capability with respect to the other expansion (considering g-ary expansions 
of comparable length), but allowing to start with a RS code of smaller length and 
having lower decoding complexity. The rate though would be relatively smaller, which 
may reduce the security of the scheme. 

4. Concatenated codes. When the error pattern is a mix of burst and random 
errors, concatenated codes provide the desired flexibility. 



Definition 4.1. LetCont - (V)^ ^ (V)^ 
be encoding functions. The concatenated code C = Cj 
code of length Nn and dimension Kk given by 



d dl} : F5 



(l)||(-(2) 



F^-/or^ = l,...,7V 
. ||Cj',f is the q-ary 



C = {(4^)(yi),C(2/2), 



(2), 



(yi,2/2, . . . ,2/Ar) = Covit{x),X e (F^fc)^} 



Here we assume the isomorphism F^k = F^', so that the maps C-m agree in the 

domain. The codes , . . ■C-^^'^ are called the inner codes and Cout is called the outer 
code. The decoding process is a two-step process in which the inner codes are decoded 
first, followed by the outer code. Suppose that the inner codes are identical, that is 

C = C||C||...||C, 



n times 
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and we can represent an element c € C as 

C = (Cl,l, . . . , Ci,„|| . . . ||CJV,1, ■ ■ ■ , CN,n)- (4.1) 

Assume that C and Cout have efficient decoding procedures. Suppose that the decoding 
procedure for C can correct t errors and the decoding procedure for Cout can correct s 
errors. If fewer than t~\-l errors occur in the coordinates of the ith copy of C then the 
inner decoding step can correct these errors and the bfock would no longer count as 
an error in the outer code. If the number of errors is greater than t then the n symbols 
corresponding to the ith copy of C count as only one error to the outer decoder. One 
consequence of this multi-layer decoding process is that C can correct any burst of 
length smaller than n{s — 1) + 2t + 1, beside the possibility of correcting other random 
errors. Alternatively, it is not difficult to see that C also has the capability to correct 
multiple burst errors, provided their lengths are not too long. 

Construction IV. In the two-dimensional case, concatenated codes can be in- 
terleaved in ways that have advantages against different types of error patterns. Let 
h I N and a\n. Arrange the elements of c in (|4.1I) in the x ^ array given by 



Cl,l . . . Cb.l 
Cb+1,1 . . . C2b,l 

CAT-b+ia . . . C7V,1 




^l.n/a ■ • • ^b^n/a ^ 
Cb+l,n/a ■ ■ ■ C2b,n/a 

CN-b+l,n/a ■ ■ ■ Cjy^n/a 












Cl,n . . . ^b.n 
Cb+l,n ■ ■ ■ C2b,n 

CN-b+l.n ■ ■ ■ CN.n / 



Using this interleaving pattern, any submatrix of size x 6 contains only one symbol 
from each inner code. It is straightforward to see that this interleaving scheme can 
correct at least t rectangular bursts of size ^ x 6, since it can correct t such bursts 
using the inner codes alone. Additionally, this interleaving pattern can correct at 
least s random errors. 

Construction V. Without interleaving, one could consider the following pat- 
tern. Let a I N and h \ n. Arrange the elements of c in gU in the ^ X ab array 
given by 



Cl,l . . . Ci,6 
Cl,6+1 . . . Ci_26 

Cl,n-b+l ■ ■ ■ C'ln 




^a.l . . . ^a,b 
Ca,b+1 ■ ■ ■ Ca,2b 

Ca,n — 6+1 ... ^a,n 








CN-a+lS ■ ■ ■ CN~a+l,b 
CN-a+l,b+l ■ ■ ■ CN-a+l.,2b 

CjV-a+l,n-6+l . . . CN-a+l,n 




CN,1 ■ ■ ■ CN,b 
CN.b+1 ■ ■ ■ CN,2b 

CN,n-b+l ■ ■ ■ CN.n 
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Using this pattern, any single burst of maximal size ((si — l)^. + 1) x ((s2 — l)b + 1) 
can be corrected by the outer code, where Si,S2 S with siS2 < s, and as long as 
each submatrix of size ^ x b which does not collide with a burst contains at most t 

b 

errors, all errors can be corrected. Therefore, this scheme works well if there is at 
most one larger burst, or few smaller ones, and a lot of random errors spread more or 
less uniformly in the matrix. 

Note that Constructions II and III can be seen as special cases of Construction V, 
where the inner code is the trivial code with b = ^Jm for Construction II, and with 
the encoding F,™ = Fq[P] c F™^*" and 6 = to for Construction III. 

Construction VI. Consider another interleaving pattern given by the n x N 
array (say N > n): 

I Ci,i C2,l C3,i C4,i . . . Cna \ 

CN,2 C1,2 C2,2 C3,2 ■ . ■ CN-1^2 

\ CN-n+2,n CjV-n+3,n C7V-n+4,n C7V-n+5,n ■ • ■ CN-n+l,n / 

Each inner code is interleaved diagonally, and it is clear from the construction that 
any burst of size 1 x n or n x 1 corrupts only one symbol from each inner code. 
Additionally, an error pattern consisting of a diagonal burst would corrupt an entire 
inner codeword but be treated as a single error to the outer decoder. 

Using a fuzzy scheme in which the witness is prone to burst errors as well as ran- 
dom errors, an interleaving scheme like that in Construction IV may be advantageous. 
In case less burst errors happen, but more random errors, a scheme like the ones in 
Construction V or VI might be better suited. 

5. Conclusions. One of the big advantages of using burst error correcting codes 
relies in the decoding complexity, since we are correcting errors in words of a certain 
length but using decoding procedures for codes of smaller lengths. For example in 
Constructions I, II and III the decoding complexity is dominated by the Reed-Solomon 
decoder in the extension field. Using the standard Gorenstein-Peterson-Zierler decod- 
ing procedure this is given by 

0(n . ^) ^ 0{{r if-—^) = 0{q'-) 

operations in Fg™. More recent algorithms for decoding cyclic codes |8j can even 
achieve a complexity of 

o(V^logn • = 0(TOgi"). 

Note, that in Construction III the conversion of the elements of Fg[P] to elements 
of Fq [a] is more complex then the conversion of elements of F™ to elements of F^m 
needed in Constructions I and II, but the complexity is still polynomial in m and thus 
does not change the overall complexity in the Big-0 notation. 

The decoding complexity for Constructions IV, V and VI depends on the choices of 
the inner and outer codes. If for example you choose a RS-code over F^t for the outer 
code and a q-avy narrow-sense primitive BCH-code of length n = — 1 for the inner 
code, then the decoding complexity is 0{Nnt) = 0[[q'' - - l)t) ^ 0{q''+"'t) 

operations in F^m , where t is the error-correction capability of the inner code, and 
0{q'^^) operations in F^i-. 
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Depending on the application, any of the six presented constructions can be ad- 
vantageous for burst error correction in tlie syndrome fuzzy hashing scenario. The 
differences appear in tlie type and number of correctable errors as well as in the de- 
coding complexity for given code word size in the q-ary expansion. Naturally, this 
list of constructions is not complete and there exist other codes that can be useful for 
burst-error correction in the storage of noisy data. 

We want to conclude with a list of recommendations for which situations the 
above constructions can be used: 

• Constructions I and II are relatively fast to decode (as the inner code is 
trivial), while having a good burst error correction capability. They can also 
correct a few random errors, but not too many. 

• Construction III can correct bursts similarly well as Construction II, and com- 
pensates a relatively lower rate with a faster decoding procedure, which can 
be relevant when deploying the syndrome fuzzy hashing scheme on embedded 
devices. 

• Construction IV works well with several and larger bursts, but not too many 
random errors. 

• Construction V works well with a few large bursts and many uniformly enough 
distributed random errors. 

• Construction VI is well suited for rectangular bursts of size ky.1 with k much 
smaller than £ (or vice versa), and for random errors. 
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